Algorithms for Large-Scale Astronomical Problems
نویسنده
چکیده
Modern astronomical datasets are getting larger and larger, which already include billions of celestial objects and take up terabytes of disk space. Meanwhile, many astronomical applications do not scale well to such large amount of data, which raises the following question: How can we use modern computer science techniques to help astronomers better analyze large datasets? To answer this question, we applied various computer science techniques to provide fast, scalable solutions to the following astronomical problems: We developed algorithms to better work with big data. We found out that for some astronomical problems, the information that users require each time only covers a small proportion of the input dataset. Thus we carefully organized data layout on disk to quickly answer user queries, and the developed technique uses only one desktop computer to handle datasets with billions of data entries. We made use of database techniques to store and retrieve data. We designed table schemas and query processing functions to maximize their performance on large datasets. Some database features like indexing and sorting further reduce the processing time of user queries. We processed large data using modern distributed computing frameworks. We considered widely-used frameworks in the astronomy world, like Message Passing Interface (MPI), as well as emerging frameworks such as MapReduce. The developed implementations scale well to tens of billions of objects on hundreds of compute cores. During our research, we noticed that modern computer hardware is helpful to solve some sub-problems we encountered. One example is the use of Solid-State Drives (SSDs), whose random access time is faster than regular hard disk drives. The use of Graphics Processing Units (GPUs) is another example, which, under right circumstances, is able to achieve a higher level of parallelism than ordinary CPU clusters. Some astronomical problems are machine learning and statistics problems. For example, the problem of identifying quasars from other similar astronomical objects can be formalized as a classification problem. In this thesis, we applied supervised learning techniques to the quasar detection problem. Additionally, in the context of big data, we also evaluated existing active learning algorithms which aim to reduce the total number of human labels. All the developed techniques are designed to work with datasets that contain billions of astronomical objects. We have tested them extensively on large datasets and report the running times. We believe the interdisciplinarity between computer science and astronomy has great potential, especially toward the big data trend.
منابع مشابه
COMPUTATIONALLY EFFICIENT OPTIMUM DESIGN OF LARGE SCALE STEEL FRAMES
Computational cost of metaheuristic based optimum design algorithms grows excessively with structure size. This results in computational inefficiency of modern metaheuristic algorithms in tackling optimum design problems of large scale structural systems. This paper attempts to provide a computationally efficient optimization tool for optimum design of large scale steel frame structures to AISC...
متن کاملA New Play-off Approach in League Championship Algorithm for Solving Large-Scale Support Vector Machine Problems
There are many numerous methods for solving large-scale problems in which some of them are very flexible and efficient in both linear and non-linear cases. League championship algorithm is such algorithm which may be used in the mentioned problems. In the current paper, a new play-off approach will be adapted on league championship algorithm for solving large-scale problems. The proposed algori...
متن کاملCONSTRAINED BIG BANG-BIG CRUNCH ALGORITHM FOR OPTIMAL SOLUTION OF LARGE SCALE RESERVOIR OPERATION PROBLEM
A constrained version of the Big Bang-Big Crunch algorithm for the efficient solution of the optimal reservoir operation problems is proposed in this paper. Big Bang-Big Crunch (BB-BC) algorithm is a new meta-heuristic population-based algorithm that relies on one of the theories of the evolution of universe namely, the Big Bang and Big Crunch theory. An improved formulation of the algorithm na...
متن کاملSolving Re-entrant No-wait Flexible Flowshop Scheduling Problem; Using the Bottleneck-based Heuristic and Genetic Algorithm
In this paper, we study the re-entrant no-wait flexible flowshop scheduling problem with makespan minimization objective and then consider two parallel machines for each stage. The main characteristic of a re-entrant environment is that at least one job is likely to visit certain stages more than once during the process. The no-wait property describes a situation in which every job has its own ...
متن کاملAddressing a fixed charge transportation problem with multi-route and different capacities by novel hybrid meta-heuristics
In most real world application and problems, a homogeneous product is carried from an origin to a destination by using different transportation modes (e.g., road, air, rail and water). This paper investigates a fixed charge transportation problem (FCTP), in which there are different routes with different capacities between suppliers and customers. To solve such a NP-hard problem, four meta-heur...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010